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Abstract —In this paper, we study the sensitivity of the spectral 
clustering based community detection algorithm subject to a 
Erdos-Renyi type random noise model. We prove phase transi¬ 
tions in community detectability as a function of the external edge 
connection probability and the noisy edge presence probability 
under a general network model where two arbitrarily connected 
communities are interconnected by random external edges. 
Specifically, the community detection performance transitions 
from almost perfect detectability to low detectability as the inter¬ 
community edge connection probability exceeds some critical 
value. We derive upper and lower bounds on the critical value and 
show that the bounds are identical when the two communities 
have the same size. The phase transition results are validated 
using network simulations. Using the derived expressions for the 
phase transition threshold we propose a method for estimating 
this threshold from observed data. 

Index Terms —community detectability, noisy graph 
I. Introduction 

Community detection is a graph signal processing problem fTl- 
(9l where the goal is to cluster the nodes on a graph into different 
communities by inspecting the connectivity structure of the graph. 
Consider an undirected regular graph consisting of two node-disjoint 
communities interconnected by some external edges. Let n denote 
the total number of nodes in the network. The network topology can 
be characterized by its symmetric adjacency matrix A, where A is 
an n x n matrix, with A ij = 1 if an edge exists between nodes i 
and j, and A,j = 0 otherwise. 

Since community detection can be viewed as a graph partitioning 
problem that can be solved by identifying the graph cut that correctly 
separates the communities, spectral clustering tm on approaches 
to community detection are natural 02-03 • Spectral clustering 
specifies a graph cut by inspecting the eigenstructure of the graph. 
Let l n (0 n ) be the n-dimensional all-one (all-zero) vector. Define 
L = D - A as the graph Laplacian matrix of the graph, where 
D = diag(Al„) is the diagonal degree matrix. Let Ai(L) denote the 
i-th smallest eigenvalue of L. It is well-known that Ai(L) = 0 since 
Ll n = 0,j and L is a positive semidefinite (PSD) matrix fl6| , 1171 . 
The second smallest eigenvalue, A 2 (L), is known as the algebraic 
connectivity. The eigenvector associated with A 2 (L) is called the 
Fiedler vector 03 . a mathematical representation of the algebraic 
connectivity is 

A 2 (L) = min x T Lx. (1) 

||x|| 2 =l,l£x=0 

The principle underlying spectral clustering for community detec¬ 
tion 03-03 is summarized as follows: 

1) Compute the graph Laplacian matrix L = D — A. 

2) Compute the Fiedler vector y. 

3) Perform K-means clustering 03 on the entries of y to 
cluster the nodes into two groups. To detect more than two 
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communities, we can use successive spectral clustering on the 
discovered communities Q], 03. 

Most literature on community detectability ED -[28j focuses on 
the noiseless setting where the edges are not subject to random 
insertions or deletions. However, in practice the network data can 
be corrupted by incorrect measurements or background noises (e.g., 
bio-informatics data) that can produce such random insertions and 
deletions. Consequently, analyzing the sensitivity of community 
detection algorithms to noise is an important task. In this paper, 
we prove the existence of abrupt phase transitions in community 
detectability for spectral community detection under a Erdos-Renyi 
type random noise model. Our network model includes the widely 
used stochastic block model j29j as a special case. We show that at 
some critical value of random external edge connection probability 
the community detection performance transitions from almost perfect 
detectability to low detectability in the large network limit (large n). 
We provide asymptotic upper and lower bounds on this critical value. 
The bounds become equal to each other when these two community 
sizes are identical. This framework can be generalized to community 
detection on more than two communities by aggregating multiple 
communities into two larger communities. 

We use simulated networks to validate the asymptotic expressions 
for the phase transitions. Using our theory, we propose an empirical 
estimator of the critical phase transition threshold that can be applied 
to data. These empirical estimates are used to test whether the detector 
is operating in a reliable detection regime, i.e., below the phase 
transition threshold. 


II. Network Model and Related Works 

Consider two arbitrarily connected communities with internal 
adjacency matrices As 1 and A s 2 and network sizes m and n 2 , re¬ 
spectively. The external connections between these two communities 
are characterized by an m x n 2 adjacency matrix C s, where each 
entry in Cs is a Bemoulli(p) random variable. Let n = n\ + n 2 . 
The overall n x n adjacency matrix of the community structure can 
be represented as 


As, C s 

C T S As 


The widely used stochastic block model 1291 is a special case of 
43 when the two community structures are generated by connected 
Erdos-Renyi random graphs parameterized by the within-community 
connection probability pi ( i = 1, 2). Our network model is more 
general since we only assume random connection probability p on 
the external edges and we allow the within-community adjacency 
matrices A st to be arbitrary. In this paper we consider the noisy 
setting in which the adjacency matrix As is corrupted by a random 
adjacency matrix Ajv such that the observed adjacency matrix is 
A = As + A at. The adjacency matrix Ajv is generated by a Erdos- 
Renyi random graph with edge connection probability q. Note that 
this model only allows random insertions and not deletions of edges. 

Community detectability has been studied under the stochastic 
block model with restricted assumptions such as m = n 2 , pi # P 2 
and fixed average degree as the network size n increases E3-EE 




m. The planted clique detection problem in 02 is a special case 
of the stochastic block model when pi m 1 and P 2 = p. A less 
restricted stochastic block model is studied in 1281 where a universal 
phase transition in community detectability is established for which 
the critical value does not depend on the community sizes. A similar 
model to our network model is studied in 1321 for interconnected 
networks. However, in 02 the subnetworks are of equal size and 
the external edges are known (i.e., non-random). Phase transitions 
in spectral community detection under noiseless network setting is 
studied in l27t . 


III. Phase Transition Analysis 


Let l ni be the m -dimensional all-one vector and let = 

diag(Csl„ 2 ) and Ds 2 = diag (Cg l ni ). The graph Laplacian 
matrix of the noiseless graph can be represented as 


Lsi + Dsi —C s 

—Cg Lg 2 + Dg 2 


(3) 


where Lg ; is the graph Laplacian matrix of i-th community. Simi¬ 
larly, the graph Laplacian matrix of the noise matrix can be repre¬ 
sented as 


L_v = 


LiVj + D v, 
— C' T 

TV 


-C N 

Ljv 2 + D ,v 2 


(4) 


where Ljvi is the graph Laplacian matrix of the noise matrix in i-th 
community, Cjv is the adjacency matrix of noisy edges between two 
communities, D .v, = diag(Cjvln 2 ) and Djv 2 = diag(C^l ni ). 
Therefore the overall graph Laplacian matrix is L = Lg + Ljv. 

Let x = [xi X 2 ] t , where xi € R ni and x 2 £ R" 2 . By 0 we 
have A 2 (L) = min x x T Lx subject to the constraints xfxi+x^x 2 = 
1 and xf l ni + x .2 1„ 2 = 0. Using Lagrange multipliers p, v and 
0. the Fiedler vector y = [yi y 2 ] T of L, with yi £ R ni and 
yi £ R™ 2 , satishes y = arg min x T(x), where 

r(x) = xf (Lg-L + Dg, + L Aq + Djvj )xi — 2xf (Cs + C jv)x 2 
+ x 2 (Ls 2 + Ds 2 + L n 2 + Djv 2 )x 2 
-/r(xfxi +X 2 X 2 - 1) - ^(x^l ni +x^l„ 2 ). (5) 


Differentiating 0 with respect to xi and x 2 respectively, and 
substituting y to the equations, we obtain 


Let Cg = pl ni ln 2 , a matrix whose elements are the means of 
entries in Cg. Let <Ji(M) denote the i-th largest singular value of 
a rectangular matrix ivQ and write Cs = Cg + Ag, where Ag = 
Cg — Cg. By Latala’s theorem 1331 , E [or ^ ~^=i= J —> 0. This 
is proved in Appendix VII-A of G2- Furthermore, by Talagrand’s 
concentration inequality (34], almost surely, 

ai (~^L=) -A p; ui (~^L=) ^ 0 V * > 2 (12) 

\y/ n l n 2 J \ V n l n 2 / 

when ni,n 2 —> 00 and ^ —> c > 0. This is proved in Appendix 
VII-B of 02. Note that the convergence rate is maximal when 
m = n 2 because ni + n 2 > 2 y /nin 2 and the equality holds if 
ni = n 2 . Similarly, let Cjv = gl ni l^ 2 , a matrix whose elements 

are the means of entries in Ajv. We have o\ ( , Cjv ) —>■ q and 

V V n l n 2 ) 

ai (—£==') -A 0 V i > 2 when m, n 2 —> 00 and — —> c > 0. 

\ p^l"2 ) — n 2 

As proved in lsj , the singular vectors of Cs (Cat) and Cs (Cjv) 
are close to each other in the sense that the squared inner product 
of their left/right singular vectors converges to 1 almost surely when 
y/nrn 2 p —» 00 (yjniniq —» 00 ). Consequently, we have, almost 
surely, 

(Ds.+DaOI^ = (Cs + C^l, (13) 

n 2 712 

(Ds 2 + Djv 2 )ln 2 = (Cs + _>(p + g)1 ( 14 ) 

n 1 m 

Applying 1121 . ( 1131 and ( 1141 to 0 and 0 and recalling that v = 0 
and ^ = c > 0, we have, almost surely, 

-y={p + q)in x yi - Vc(p + q) i^ 2 y 2 - -a 0; (15) 

y/c y/nm 2 

Vc{p + q) ln 2 Y 2 - ^(p +<?)!„!yi - -A 0. (16) 

By the fact that l^yi + l« 2 y 2 = 0, we have, almost surely, 

(^/c + VT:) ( P + q ~ n) 1 " iyi 0; H 7 ) 

(Vh+^=) (p + *-£)tf 2 y 2 -X>. (18) 


2(Lsi + Ds! + L^ + DjVjjyi — 2(Cs + Cjv)y 2 — 2pyi — ul ni 

= O ni , ( 6 ) 

2(Ls 2 + Ds 2 + Ljv 2 + Djv 2 )y 2 — 2(Cs + Cjv) T yi — 2py 2 — vl „ 2 

= 0„ 2 . (7) 

Left multiplying 0 by l , r l 1 and left multiplying © by 1^ 2 , we have 

21n 1 (Dg 1 + DjvJyi - 21^(03 + Cjv)y 2 - 2/xl^yi - vm 

= 0 , ( 8 ) 

21n 2 (Ds 2 + Djv 2 )y 2 — 21^ 2 (Cs + Cjv) T yi — 2pl^ 2 y 2 — vn 2 
= 0. (9) 

Since by definition I^Ds-l = ln 2 Cs, l^Cs = l^ 2 Ds 2 , 

In’, D, Vl = l T n2 C T N and l^Cjv = 1^ 2 Dat 2 , adding 0 and 0 

we obtain v = — ^■(yi'ln 1 + y^ln 2 ) = 0 by the fact that the 
Fiedler vector y has the property y T l = 0. Applying v = 0 and left 
multiplying 0 by yf and left multiplying 0 by y 2 , we have 

yf (Lsi + Ds! + LjVi + DjvJyi — y\ (Cg + C;v)y2 — pylyi 
= 0 , ( 10 ) 


Consequently, as p = A 2 (L), at least one of the two cases have to 
be satisfied: 


Case 1: ^4 p + q =: t, 

n 

Case 2: l^yi —» 0 and l^ 2 y 2 -A 0 almost surely. 


(19) 

( 20 ) 


We will show that the algebraic connectivity A 2 (L)/n and the 
Fiedler vector y undergo a phase transition between Case 1 and Case 
2 as a function of t = p + q. That is, a transition from Case 1 to Case 
2 occurs when p exceeds a certain threshold p*. In Case 1, observe 
that asymptotically g r0 ws linearly with t while the asymptotic 

Fiedler vector remains the same (unique up to its sign). Furthermore, 
from ( flOt . i fT7T i. (Q0. (Q0- M = A 2 CL) and l^,yi + l ^ 2 y 2 = 0, the 
Fielder vector y in Case 1 has the following property. Almost surely, 


yf(L Sl +Lj Vl )yi 
y/nm 2 

yE (Ls 2 + Ljy 2 )y2 
y/n\n 2 


+ 


+ 


p + q 
y/nm 2 

p + q 

y/n\n 2 


(i^yi f 
(in.yi ) 2 


Vc(p + q)yTyi -a o, 
( 21 ) 

1 T 

—7={jp + q) y 2 y 2 -a 0 . 

V c 

( 22 ) 


yi (Ls 2 + Ds 2 + Ljv 2 + Div 2 )y 2 - yl (Cs + Civ) T yi - py 2 y 2 

= 0. (11) 'Note that for convenience, we use Aj(Mi) to denote the ?-th smallest 

eigenvalue of a square matrix Mi and use <Xj(M 2 ) to denote the z-th largest 
Adding (H and 033 and by 0 and 0 we obtain fi = A 2 (L). singular value of a rectangular matrix M 2 . 


































Adding d2 1 b and (122b . we have 
1 


y/nm2 

2(i^yi) 2 


[yf (Ls x + LjVi )yi + yl (Ls 2 + L]v 2 )y 2 ] + 


- [VcyTyi + “FyaV 2 


(p + q) 


y/nynz 

2(1^ yi) 2 

y/nm2 


I ,— rp L r r 

- ( Vcyi yi + -j= y 2 y 2 


0 . 


nni 

n 2 


yi -> ±lm and 


nn 2 

ni 


-y 2 




+ yl (Ls 2 + Ds 2 + Ljv 2 + Djv 2 )y 2 


y/nrn 2 


y/n\ri 2 


< 


yT (Cs + Cjy)y 2 + ||yi|| 2 ||y 2 || 2 ■ [<n(As) + o-i(Ajy)] 

y/nin 2 


—yT (Dsi +Dj Vl )yi 

n 2 

—yf(Ds 2 +Div 2 )yi 

m 

Therefore in Case 2 we have 


(p + gOyfyi; 

(p + '7)y 2 ’y2- 


A 2 (L) 


mm 

x£«S 


x[LiXi + X2 L2X2 + n 2 ^xi xi + ni^X 2 X2 


where L, ; = Lsi + La n, t = p + q, and 
S = jx =?: [xi x 2 ] t : 1 ^X 1 = 1 ^ 2 x 2 = 0, xfxi + x 2 x 2 = 1 j . 


(32) 


y/nW 2 

As the two bracketed terms in 123} converge to finite constants for 
all t = p + q in Case 1; almost surely, 

[yi (Lsi +LAr 1 )yi + y 2 (Ls 2 +LAr 2 )y 2 j —> 0 ; (24) 

(25) 


0- (23) Define two sets 


<Si = jx : lJjXi = 1^ 2 x 2 = 0, xfxr = 1, x 2 x 2 = o| ; (33) 
S 2 = |x : 1 ^x 1 = 1 ^ 2 x 2 = 0 , xfxi = 0 , x 2 x 2 = l| , (34) 


and define 


Pi(L) = min 


xfLiXi + x 2 L 2 x 2 + n 2 fxfxi + mtx 2 x 2 


By the PSD property of the graph Laplacian matrix, yT (Ls x + 
LjvJyi + y 2 ’(L s 2 + LAT 2 )y 2 > 0 if and only if yi and y 2 are 
not constant vectors. Therefore 124} implies yi and y 2 converge 
to constant vectors. By the constraints yTyi + yIV 2 = 1 and 
l^i y 1 + ln 2 y 2 = 0 , we have, almost surely, 


(35) 


(26) 


Consequently, in Case 1 yi and y 2 tend to be constant vectors with 
opposite signs. More importantly, ( 126b suggests a phase transition in 
spectral community detectability. In Case 1, spectral clustering can 
almost correctly identify these two communities since yi and y 2 are 
constant vectors with opposite signs. On the other hand, in Case 2, 
l^i yi —» 0 and l^ 2 y 2 —» 0 almost surely. The entries of yi and 
y 2 tend to have opposite signs in their entries. Therefore in Case 2 
spectral clustering results in very poor community detection. 


IV. Upper and Lower Bounds on the Critical 
Value 

Next we derive an upper bound on the critical value p* of the 
phase transition. From ID and 0 we know that 

A 2 (L) = yT (Lsj + Dsj + Ljv-l + DAq )yi — 2 yf (Cs + CAr)y 2 


Since <Si, <S 2 C S , we have, almost surely, 

^ 2 ^ < min {pi (L), p 2 (L)} 
n 

. (A 2 (Li) + n 2 t A 2 (L 2 )+nif 

= mm < —-—--, —-— ; - 

l n n 

_t A 2 (Li) + A 2 (L 2 ) — |A 2 (Li) — A 2 (L 2 ) + (n 2 — ni)f| 
“ 2 H 2 n 

<t \ni — n 2 |f A 2 (Li) + A 2 (L 2 ) — |A 2 (Li) — A 2 (L 2 )| 
- 2 ~ 2 n 2n ' 


(36) 

where we use the facts that min{a, 6 } = a+ 6 ~J n ~ 6 and \a — &| > 
|a| — | 6 |. Note that the last equality in ( 136b holds if n\ = n 2 . Let 
t* = p* + q be the critical value for phase transition from Case 
1 to Case 2. There is a phase transition on the asymptotic value 
of A 2 ( L ) since the slope of AlILl converges to 1 almost surely 
when t < t *, whereas from ( f36t X2 ^ — t < 4 . 

A 2 (L 1 ) + A 2 (L 2 )-|A 2 (L 1 )-A 2 (L 2 )| when t \ t * From ggj obtain 

an asymptotic upper bound pub on the critical value p* by substituting 
t* = p* + q to 06 } . 


(27) 


subject to l^yi + l„ 2 y 2 = 0 and yfyi + y 2 y 2 = 1. In Case 
2 , since lj yi —> 0 and l„ 2 y 2 —> 0 almost surely, recalling the 
definition A s = Cs — Cs and let Aat = Cat — Cat, 

y T (Cs + Cjv)y 2 _ yT (Cs + CAr)y 2 + y T Asy 2 + yT Ajvy 2 


A 2 (Li) + A 2 (L 2 ) - |A 2 (L!) - A 2 (L 2 )| 

Pub = - j - j - q. 

n — m — n 2 


To derive a lower bound on p*, we have that in Case 2, 

A 2 (L) a.s. . f xfLiXi + x^L 2 x 2 + n 2 pxfxi + mpx 2 x 2 

—-—- —> mm < - 

n xe 5 ( n 

. . f xfLiXi + x 2 L 2 x 2 I . f n 2 fxfxi + mtx 2 x 2 

> mm < - > + mm < - 

xes ( n J xgs ( n 


(37) 


(38) 


. / A 2 (Li) A 2 (L 2 ) I . / nxt n 2 t 

= mm < -,- > + mm < -,- 

( n n J [ n n 

t |ni — n 2 |t A 2 (Li) + A 2 (L 2 ) — |A 2 (Li) — A 2 (L 2 ) 

< 28 > =2 - 2 n + -2 n - 


by the fact that eri ( As ^ 0 and <ti ( Ajv ^ -Ly 0 in 

Appendix VII-B of (27l and Cs = pl ni lT 2 an d C n = <?lnil^ 2 . 
Furthermore, since Dsj = diag(Csln 2 ), Ds 2 = diag^slnx), 
= diag(Civl„ 2 ) ttnd Div 2 = diag (C^l ni ), ([12} gives, 
almost surely. 


(39) 

Substituting t* = p* + q to ( 139b . we obtain an asymptotic lower 
bound plb on the critical value p*. 


Plb 


A 2 (Li) + A 2 (L 2 ) - |A 2 (Li) - A 2 (L 2 ) 


- q- 


(40) 


(29) 

(30) 


n + |m — n 2 1 

Note that when ni = n 2 , the equality in ( 138b holds. This means when 

..... ....... ...... .. 

2 n 


„ _„ ML a - s ', i 1 ^ 2 (Li)+^ 2 (l 2 ) —|a 2 (Li)—x 2 (l 2 )| _. t 1 

111 112, „ r 2 -r 2n 2 ' c 


in Case 2, and the critical value 


„ a.s. A 2 (L 1 ) + A 2 (L 2 )-|A 2 (Li)-A 2 (L 2 )| 

p —> - q- 


(41) 


( 31 ) 


Here we derive the bounds on the critical value p* for the stochas¬ 
tic block model, where the internal adjacency matrix A, in 0 is 
generated by a Erdos-Renyi random graph with edge connection prob- 


















































(a) 



noise level ( q ) 

0 

0.002 

0.01 

0.05 

0.1 

detectability 

mean 

0.8571 

0.8548 

0.8004 

0.6325 

0.5038 

std 

0 

0.006 

0.1227 

0.1597 

0.0823 

PLB 

mean 

0.0127 

0.0116 

0.0076 

0.00016 

0 

std 

0 

0.0021 

0.0039 

0.001 

0 


mean 

0.0073 

0.0095 

0.0173 

0.0513 

0.0835 


std 

0 

0.001 

0.0025 

0.011 

0.0209 

PUB 

mean 

0.013 

0.0124 

0.0633 

0.1422 

0.1494 

std 

0 

0.0021 

0.1493 

0.3199 

0.3213 

fraction of p < plb 

1 

0.98 

0.01 

0 

0 

fraction of pub <p< Pub 

0 

0.02 

0.75 

0.2 

0.2 

fraction of p > pub 

0 

0 

0.24 

0.8 

0.8 


TABLE I 


Sensitivity of spectral community detection to noisy edge insertions for Amazon 
Fig. 1. Two communities generated by the stochastic block model American political books co-purchasement data [gQ. The network contains 105 nodes 
(23. The results are averaged over 100 trials, m = n 2 = 2000, and 441 ed S es ' The oracle detectability is 0.8762. The noisy edges are randomly 
pi = p 2 = 0.25, and q = 0.05. The theoretical critical value from generated for 100 trials. 

E} is P* = 0.2229. 


ability pi. It is proved in Appendix VII-C of (27) that A 2 

Pi + q. Therefore p UB = ^i+P2-|c P i-P2+(_c-ikl-U-ck and pLR m 

cpi+P2-\c P l-M+(c-l) q \-\l-c\ q . When ni = ^ (i g i c = 1X the 

critical value p* P1+P2 ~J P1 ~ P2 ^ ■ This suggests that in the largest 
network limit when n —> oo and c = 1 the performance of spectral 
community detection is independent of the noise parameter q. 

V. Performance Evaluation 
A. Simulated Networks 

We use the stochastic block model |29t to generate network graphs 
for community detection. The detectability is defined as the fraction 
of nodes that are correctly identified and the baseline detectability 
is 0.5 for random guesses. In Fig. 1, when pi = P 2 = 0.25, m = 
ri 2 = 2000 and q = 0.05, the theoretical critical value from ED is 
p* = 0.2229. Note that p* will converge to 0.25 as we increase n 
as predicted in Sec. m 

Fig.Q](a) verifies the phase transition in A2 ^' > empirically confirm¬ 
ing that A2 ^ L) approaches p + q when p < p* and a pp roac hes 

+c* when p > p‘, where c* = M(Li)+a 2 (l 2 )-|£ 2 (l 1 )-a 2 (l 2 )| _ 
Fig. [fl (b) shows that the community detectability transitions from 
almost perfect detectability when p < p* to low detectability when 
p > p*. Moreover, as derived in ( 1261 . the Fiedler vector components 
yi and y 2 are constant vectors with opposite signs for p < p* , and 
IniYi -5► 0 and l^ 2 y 2 -A 0 for p > p *, as shown in Fig. □ (c). 


B. Empirical Estimators of Phase Transition Bounds on Real- 
world Dataset 

Here we show that the critical phase transition threshold p* can 
be empirically estimated to empirically test the reliability of spectral 
community detection. Let L; be the graph Laplacian matrix of the 
estimated community i obtained by applying spectral clustering to 
the observed adjacency matrix A and let rii denote the estimated 
network size of community i. Using 03 and m. the empirical 
estimators of these parameters are defined as 

p = number of identified external edges/riin 2 , (42) 
A 2 (Li) + A 2 (L 2 ) — |a 2 (Li) — A 2 (L 2 ) 

^ LB n 4- |m — n 2 1 

A 2 (Li) + A 2 (L 2 ) - |a 2 (Li) - A 2 (L 2 ) 


Based on these empirical estimates, the performance of community 
detection can be classified into three categories. If p < plb, the 
network is in the reliable detection region. If plb < p < pub, the 
network is in the intermediate detection region. If p > Pub, the 
network is in the unreliable detection region. 

The co-purchasement data between 105 American political books 
sold on Amazon HD are used to estimate the parameters plb, Pub 
and p. For the corresponding network graph nodes represent political 
books and edges represent co-purchasements. An edge exists between 
two books if they are frequently purchased by the same buyer. 
Three labels, liberal, conservative and neutral, were determined by 
Newman f36j . We perform community detection by separating the 
books into two groups since there are only 13 books with neutral 
labels (i.e., the oracle detectability is 0.8762). To investigate the 
sensitivity of spectral community detection to noisy edge insertions, 
for each edge not present in the original graph, an edge is added 
with probability q. The community detection results are summarized 
in Table I. Observe that for small q (q =0 or 0.002) the network is 
mostly in the reliable detection region (p < plb), which indicates 
that spectral community detection achieves high detectability. When 
q = 0.01, the network is mostly in the intermediate detection region 
(Plb < p < pub), indicating that the community detectability has 
large variation. When q is large (g=0.05 or 0.1), the network is 
mostly in the unreliable detection region resulting in low detectability. 
The large standard deviation of pub for large q is due to the 
fact that spectral community detection may mistakenly detect two 
communities with extremely imbalanced community sizes such that 
the denominator of the estimator pub is small. 

VI. Conclusion 

We establish asymptotic phase transition bounds on the critical 
value p* under a general network setting corrupted by a Erdos-Renyi 
type noise model. The communities are proven to be almost perfectly 
detectable below the phase transition threshold and to be undetectable 
above the phase transition threshold. The phase transition bounds 
are used to establish empirical estimators to evaluate the reliability 
of spectral community detection, where the detector is said to be 
operating in the reliable, intermediate, or unreliable detection regime 
based on the empirical estimates. Simulated networks generated 
by the stochastic block model validate the phase transition theory 
for community detectability. An empirical estimator of the phase 
transition is proposed that can be used to explore sensitivity of the 
spectral community detection algorithm on real data. 


Pub = 


n — \n\ — n 2 | 


(44) 
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